Goto

Collaborating Authors

 mdp model


Representation Balancing MDPs for Off-policy Policy Evaluation

Neural Information Processing Systems

We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in common synthetic benchmarks and a HIV treatment simulation domain.


A Novel MDP Decomposition Framework for Scalable UAV Mission Planning in Complex and Uncertain Environments

Quamar, Md Muzakkir, Nasir, Ali, ELFerik, Sami

arXiv.org Artificial Intelligence

This paper presents a scalable and fault-tolerant framework for unmanned aerial vehicle (UAV) mission management in complex and uncertain environments. The proposed approach addresses the computational bottleneck inherent in solving large-scale Markov Decision Processes (MDPs) by introducing a two-stage decomposition strategy. In the first stage, a factor-based algorithm partitions the global MDP into smaller, goal-specific sub-MDPs by leveraging domain-specific features such as goal priority, fault states, spatial layout, and energy constraints. In the second stage, a priority-based recombination algorithm solves each sub-MDP independently and integrates the results into a unified global policy using a meta-policy for conflict resolution. Importantly, we present a theoretical analysis showing that, under mild probabilistic independence assumptions, the combined policy is provably equivalent to the optimal global MDP policy. Our work advances artificial intelligence (AI) decision scalability by decomposing large MDPs into tractable subproblems with provable global equivalence. The proposed decomposition framework enhances the scalability of Markov Decision Processes, a cornerstone of sequential decision-making in artificial intelligence, enabling real-time policy updates for complex mission environments. Extensive simulations validate the effectiveness of our method, demonstrating orders-of-magnitude reduction in computation time without sacrificing mission reliability or policy optimality. The proposed framework establishes a practical and robust foundation for scalable decision-making in real-time UAV mission execution.


Representation Balancing MDPs for Off-policy Policy Evaluation

Neural Information Processing Systems

We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in common synthetic benchmarks and a HIV treatment simulation domain.



Solving Finite-Horizon MDPs via Low-Rank Tensors

Rozada, Sergio, Orejuela, Jose Luis, Marques, Antonio G.

arXiv.org Artificial Intelligence

We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not stationary. This aggravates the challenges of high-dimensional MDPs, as they suffer from the curse of dimensionality and high sample complexity. To address these issues, we propose modeling the VFs of finite-horizon MDPs as low-rank tensors, enabling a scalable representation that renders the problem of learning optimal policies tractable. We introduce an optimization-based framework for solving the Bellman equations with low-rank constraints, along with block-coordinate descent (BCD) and block-coordinate gradient descent (BCGD) algorithms, both with theoretical convergence guarantees. For scenarios where the system dynamics are unknown, we adapt the proposed BCGD method to estimate the VFs using sampled trajectories. Numerical experiments further demonstrate that the proposed framework reduces computational demands in controlled synthetic scenarios and more realistic resource allocation problems.


Representation Balancing MDPs for Off-policy Policy Evaluation

Neural Information Processing Systems

We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in common synthetic benchmarks and a HIV treatment simulation domain.


Optimized Task Assignment and Predictive Maintenance for Industrial Machines using Markov Decision Process

Nasir, Ali, Mekid, Samir, Sawlan, Zaid, Alsawafy, Omar

arXiv.org Artificial Intelligence

The importance of predictive maintenance is well-recognized in the industrial sector for several reasons, e.g., it allows for the reduction in machine downtime, it helps in reducing the production cost, and it is useful in enhancing the life of machines. Consequently, predictive maintenance is one of the key areas of research among the scientific community. Initially, the predictive maintenance used to be time-based but later on (with the advances in sensing technology), condition-based maintenance (CBM) gained more popularity. Maintenance of machine tools involve two key stages, i.e., diagnosis and prognosis. Prognosis deals with the prediction of remaining useful life (RUL) of the machine whereas diagnosis is concerned with detection and identification of various faults in the machine. Major approaches for prognosis include data-based approaches, knowledge-based approaches, and physics (model) based approaches. Diagnosis on the other hand is based on centralized or distributed approaches [1]. Key challenges in predictive maintenance include 1) Dealing with the noisy sensor data, 2) Uncertainty in the operating conditions, and 3) Diversity of tasks assigned to the machine. A comparison between time-based and condition-based maintenance strategies has been presented in [2].


Towards Transparent Robotic Planning via Contrastive Explanations

Chen, Shenghui, Boggess, Kayla, Feng, Lu

arXiv.org Artificial Intelligence

Providing explanations of chosen robotic actions can help to increase the transparency of robotic planning and improve users' trust. Social sciences suggest that the best explanations are contrastive, explaining not just why one action is taken, but why one action is taken instead of another. We formalize the notion of contrastive explanations for robotic planning policies based on Markov decision processes, drawing on insights from the social sciences. We present methods for the automated generation of contrastive explanations with three key factors: selectiveness, constrictiveness, and responsibility. The results of a user study with 100 participants on the Amazon Mechanical Turk platform show that our generated contrastive explanations can help to increase users' understanding and trust of robotic planning policies while reducing users' cognitive burden.


Representation Balancing MDPs for Off-policy Policy Evaluation

Liu, Yao, Gottesman, Omer, Raghu, Aniruddh, Komorowski, Matthieu, Faisal, Aldo A., Doshi-Velez, Finale, Brunskill, Emma

Neural Information Processing Systems

We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in common synthetic benchmarks and a HIV treatment simulation domain. Papers published at the Neural Information Processing Systems Conference.


Before we can find a model, we must forget about perfection

Dobrev, Dimiter

arXiv.org Artificial Intelligence

With Reinforcement Learning we assume that a model of the world does exist. We assume furthermore that the model in question is perfect (i.e. it describes the world completely and unambiguously). This article will demonstrate that it does not make sense to search for the perfect model because this model is too complicated and practically impossible to find. We will show that we should abandon the pursuit of perfection and pursue Event-Driven (ED) models instead. These models are generalization of Markov Decision Process (MDP) models. This generalization is essential because nothing can be found without it. Rather than a single MDP, we will aim to find a raft of neat simple ED models each one describing a simple dependency or property. In other words, we will replace the search for a singular and complex perfect model with a search for a large number of simple models.